Task Completion Transfer Learning for Reward Inference

نویسندگان

Layla El Asri

Romain Laroche

Olivier Pietquin

چکیده

Reinforcement learning-based spoken dialogue systems aim to compute an optimal strategy for dialogue management from interactions with users. They compare their different management strategies on the basis of a numerical reward function. Reward inference consists of learning a reward function from dialogues scored by users. A major issue for reward inference algorithms is that important parameters influence user evaluations and cannot be computed online. This is the case of task completion. This paper introduces Task Completion Transfer Learning (TCTL): a method to exploit the exact knowledge of task completion on a corpus of dialogues scored by users in order to optimise online learning. Compared to previously proposed reward inference techniques, TCTL returns a reward function enhanced with the possibility to manage the online non-observability of task completion. A reward function is learnt with TCTL on dialogues with a restaurant seeking system. It is shown that the reward function returned by TCTL is a better estimator of dialogue performance than the one returned by reward inference.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Policy Transfer using Reward Shaping

Transfer learning has proven to be a wildly successful approach for speeding up reinforcement learning. Techniques often use low-level information obtained in the source task to achieve successful transfer in the target task. Yet, a most general transfer approach can only assume access to the output of the learning algorithm in the source task, i.e. the learned policy, enabling transfer irrespe...

متن کامل

Transfer Reinforcement Learning with Shared Dynamics

This article addresses a particular Transfer Reinforcement Learning (RL) problem: when dynamics do not change from one task to another, and only the reward function does. Our method relies on two ideas, the first one is that transition samples obtained from a task can be reused to learn on any other task: an immediate reward estimator is learnt in a supervised fashion and for each sample, the r...

متن کامل

When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition

Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other interval of task performance), what (the object...

متن کامل

A pallidus-habenula-dopamine pathway signals inferred stimulus values.

The reward value of a stimulus can be learned through two distinct mechanisms: reinforcement learning through repeated stimulus-reward pairings and abstract inference based on knowledge of the task at hand. The reinforcement mechanism is often identified with midbrain dopamine neurons. Here we show that a neural pathway controlling the dopamine system does not rely exclusively on either stimulu...

متن کامل

The Comparative Effect of Task Type and Learning Conditions on the Achievement of Specific Target Forms

The completion mode (individual, collaborative) of the tasks and the conditions under which these modes are performed have been reported to play an important role in language learning. The present study aimed to investigate the effects of employing text editing tasks performed both individually and collaboratively, on the achievement of English grammar under explicit and implicit learning condi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Task Completion Transfer Learning for Reward Inference

نویسندگان

چکیده

منابع مشابه

Policy Transfer using Reward Shaping

Transfer Reinforcement Learning with Shared Dynamics

When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition

A pallidus-habenula-dopamine pathway signals inferred stimulus values.

The Comparative Effect of Task Type and Learning Conditions on the Achievement of Specific Target Forms

عنوان ژورنال:

اشتراک گذاری